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Abstract.  We  introduce  the  concept  of  control  improvisation,  the  process  of  generating  a  random 
sequence  of  control  events  guided  by  a  reference  sequence  and  satisfying  a  given  specification.  We 
propose  a  formal  definition  of  the  control  improvisation  problem  and  an  empirical  solution  applied 
to  the  domain  of  music.  More  specifically,  we  consider  the  scenario  of  generating  a  monophonic  Jazz 
melody  (solo)  on  a  given  song  harmonization.  The  music  is  encoded  symbolically,  with  the  improviser 
generating  a  sequence  of  note  symbols  comprising  pairs  of  pitches  (frequencies)  and  discrete  durations. 
Our  approach  can  be  decomposed  roughly  into  two  phases:  a  generalization  phase,  that  learns  from  a 
training  sequence  (e.g.,  obtained  from  a  human  improviser)  an  automaton  generating  similar  sequences, 
and  a  supervision  phase  that  enforces  a  specification  on  the  generated  sequence,  imposing  constraints 
on  the  music  in  both  the  pitch  and  rhythmic  domains.  The  supervision  uses  a  measure  adapted  from 
Normalized  Compression  Distances  (NCD)  to  estimate  the  divergence  between  generated  melodies 
and  the  training  melody  and  employs  strategies  to  bound  this  divergence.  An  empirical  evaluation  is 
presented  on  a  sample  set  of  Jazz  music. 


1  Introduction 

In  traditional  supervisory  control ,  the  system  being  controlled  (aka  “plant” )  has  some  of  its  transitions  dis¬ 
abled  by  a  controller  (aka  “supervisor”)  in  order  to  enforce  a  safety  specification.  Such  a  control  strategy, 
while  effective  for  several  applications,  is  ill-suited  when  the  application  imposes  certain  additional  require¬ 
ments.  First,  in  highly  dynamic,  adversarial  environments,  simply  disabling  transitions  may  disallow  most 
or  even  all  behaviors  in  the  plant.  To  overcome  this,  one  needs  to  be  able  to  modify  transitions,  rather  than 
simply  disable  them.  Second,  it  is  often  desirable  to  have  randomness  in  the  control  strategy.  Randomness 
can  enhance  diversity,  e.g.,  to  prevent  correlated  failures  of  replicated  systems,  or  to  prevent  an  adversary 
(“attacker”)  from  easily  inferring  (and  possibly  thwarting)  the  control  strategy.  Finally,  if  randomness  is 
employed,  one  often  needs  to  impose  the  additional  requirement  that  a  trace  of  the  random  strategy  be 
“similar”  to  a  reference  trace,  to  maintain  some  predictability. 

We  introduce  a  new  concept  that  formalizes  the  above  variation  on  the  supervisory  control  problem. 
Informally,  this  concept,  termed  control  improvisation,  is  the  process  of  generating  a  randomized  control 
strategy  producing  traces  similar  to  a  reference  sequence  and  satisfying  a  given  safety  specification.  We 
propose  a  formal  definition  of  the  control  improvisation  problem  and  an  approach  to  solve  it.  There  are  several 
interesting  applications  that  require  control  improvisation.  One  application  concerns  control  in  an  emergency 
situation,  such  as  an  earthquake,  where  the  environment  deviates  greatly  from  its  specification  HD.  Another 
application  is  to  home  automation,  where  for  example,  the  lighting  in  a  home  can  be  programmed  to  switch 
randomly  when  occupants  are  away,  but  still  satisfying  constraints  (no  more  than  a  certain  number  on  at 
a  time),  and  mimicking  typical  occupant  behavior  |2f)j.  In  this  paper,  we  demonstrate  our  ideas  with  the 
problem  of  music  improvisation ,  a  compelling  application  that  combines  all  three  additional  requirements 
identified  above. 

Music  can  be  generated  either  at  the  audio  or  at  a  symbolic  level.  The  former  involves  processing  and 
synthesizing  sound  waves,  whereas  the  latter  is  concerned  only  with  generating  scores ,  i.e. ,  sequences  of 
(groups  of)  symbols,  the  notes,  each  of  them  being  an  abstract  representation  of  a  particular  sound  that 
can  be  instantiated  in  many  different  variations  by  different  instruments  or,  generally  speaking,  by  sound 
synthesizers.  Thus,  at  the  symbolic  level,  generation  of  music  is  the  same  as  generating  sequences  of  letters, 


each  of  which  corresponding  to  a  note.  Music  improvisation  is  a  special  case  of  music  generation  where  one 
generates  a  random  variant  of  a  given  melody  (sequence  of  notes).  The  field  of  music  improvisation,  also 
termed  as  machine  improvisation,  has  been  well  studied  [24].  One  approach  to  improvisation  is  data-driven, 
wherein  recurrent  patterns  are  inferred  from  the  reference  melody,  and  then  replicated  and  recombined  to 
form  the  improvisation.  Different  data  structures  and  algorithms  have  been  proposed  for  this  purpose  such 
as  incremental  parsing  (IP)  114]  inspired  from  dictionary  based  compression  algorithms  from  the  Lempel-Ziv 
family  [3],  probabilistic  suffix  trees  (PST)  [T^,  and  factor  oracles  (FO)  0.  Another  approach  is  rule-based, 
where  an  expert  encodes  rules  in  a  formal  system  such  as  a  stochastic  context  free  grammar,  using  which 
sequences  are  generated  |18j.  Both  approaches,  however,  lack  certain  desirable  properties.  First,  certain  rules 
need  to  be  enforced  always,  much  like  safety  properties.  Second,  it  is  often  desirable  to  control  the  amount 
of  “creativity”  in  the  improvisation,  using  some  kind  of  divergence  measure.  We  present  a  more  detailed 
discussion  of  related  work  in  Section  [7] 

Our  definition  of  control  improvisation  is  thus  a  good  fit  for  revisiting  the  problem  of  music  improvisa¬ 
tion.  Specifically,  we  consider  the  scenario  of  generating  a  monophonic  (solo)  melody  over  a  given  Jazz  song 
harmonization.  The  improvised  sequence  has  to  be  synchronized  with  another  sequence,  usually  the  chord 
progressions,  considered  as  fixed  and  called  hereafter  the  accompaniment.  The  improviser  then  has  to  be  a 
function  of  the  training  sequence,  the  accompaniment,  and  other  imposed  constraints  such  as  the  “safety” 
rules  and  divergence  measure.  We  present  an  approach  to  solving  the  control  improvisation  problem  for  this 
specific  application.  Our  approach  has  three  phases.  The  first  phase,  generalization,  learns  from  the  given 
melody  a  (non-deterministic)  automaton  generating  a  set  of  melodies  containing  the  original.  We  implement 
this  phase  using  factor  oracles  [2].  The  second  phase,  safety  supervision ,  enforces  rules  on  the  generalized 
automaton  so  that  it  plays  in  harmony  with  the  accompaniment.  The  rules  are  analogous  to  “safety  proper¬ 
ties”  that  a  control  system  must  always  obey.  The  third  and  final  phase,  divergence  supervision,  ensures  that 
sequences  produced  by  the  improviser  automaton  lie,  with  high  probability,  within  a  specified  “similarity” 
divergence  from  the  original.  This  phase  is  implemented  by  replacing  non-determinism  in  the  improviser 
automaton  with  probabilities,  which  are  based  on  a  given  divergence  measure.  For  music,  several  divergence 
measures  have  been  proposed  often  in  the  purpose  of  genre  classifications;  amongst  these,  Normalized  Com¬ 
pression  Distances  (NCDs)  have  been  effectively  used  [S],  and  so  we  employ  a  variant  of  an  NCD  in  this 
paper. 

In  summary,  the  main  novel  contributions  of  this  paper  are: 

•  The  notion  of  control  improvisation ,  its  formal  definition,  and  an  analysis  of  its  computational  hardness 

(Sections  [2]  and  [3]) ; 

•  An  approach  to  solve  the  control  improvisation  problem  based  on  generalization,  safety  supervision,  and 

divergence  supervision  (Section  [4]),  and 

•  An  instantiation  and  application  of  our  approach  to  improvisation  of  Jazz  melodies  (Sections  [5]  and  [6]). 

2  Control  Improvisation 

In  this  section,  we  define  formally  the  control  improvisation  problem,  which  is  a  variant  of  a  controller 
synthesis  problem.  The  main  application  presented  in  this  work  is  music  improvisation,  and  more  specifically 
we  consider  here  only  the  symbolic  aspect  of  music.  This  means  we  leave  aside  the  problems  of  synthesizing 
or  analysing  sound  signals,  which  would  require  real-valued  and  continuous-time  signal  processing,  and  work 
with  traditional  score  notation  which  is  based  only  on  discrete  sets,  namely  a  discrete  set  of  pitches  (e.g., 
a4,  c2,  g3,  etc)  and  a  discrete  set  of  durations  (quarter  notes  J,  eighth  notes  *h,etc).  As  a  consequence,  the 
formal  background  can  be  set  up  in  terms  of  finite  state  automata. 


2.1  Notation  and  Background 

Definition  1.  A  finite  state  automaton  (FSA)  is  a  tuple  A  =  (Q,qo,  F,  £,—>)  where  Q  is  a  set  of  states, 
qo  £  Q  is  the  initial  state,  F  C  Q  is  the  set  of  accepting  states,  S  is  a  finite  set  called  the  alphabet  and 
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— Q  x  I7U{e}  x  Q  is  a  transition  relation  for  which  we  use  the  usual  infix  notation  q  q'  to  mean  that 
(q,cr,q')  €—>,  and  e  is  the  empty  word. 

We  interpret  letters  of  the  alphabet  as  observable  events  of  the  system  under  consideration.  A  word  w  is 
either  e  or  a  finite  sequence  of  letters  in  E,  i.e.  w  =  o\tJ2  . . .  for  some  integer  k  >  1.  The  length  of  a  word 
is  defined  inductively  as  |e|  =0  and  \wa\  =  \  w\  +  1  Vu  S  E.  A  word  is  a  trace  of  a  FSA  A  iff  there  exists  a 
sequence  of  states  qi  £  Q  such  that  qo  —4  q±  —A  . . .  ■n~1>  qn-\  qn.  It  is  an  accepting  trace  of  A  iff  qn  is 
in  F.  The  language  of  A,  noted  C{A)  is  the  set  of  accepting  traces  of  A. 

Definition  2.  (Synchronous  Product)  Given  two  FSA  with  same  alphabet  As  =  (Qs,  gj),  Fs,  E,  — and 
Ac  =  ( Qc ,  qg,  Fc ,  E,  —7),  the  synchronous  product  of  As  and  Ac,  noted  AS||A°  is  defined  as  the  FSA  AS||AC  = 
(' Qs  X  Q°,  (qo,Vo),FS  X  Fc ,  E,  ->)  where  Vcr  e  EUe,  (q°,qf)  ^  (■ qSj,q] )  if  and  only  q?  ^  q)  and  qf  q). 

Note  that  here  we  consider  products  of  FSAs  sharing  the  same  alphabet  and  explicit  synchronization 
of  e-transitions.  FSAs  equipped  with  the  synchronous  product  are  sufficient  to  define  a  controller  synthesis 
problem.  Our  work  builds  upon  existing  results  from  the  field  of  supervisory  control  of  discrete-event  sys¬ 
tems  [7].  Assume  that  we  are  given  an  FSA  Ap ,  called  the  plant  FSA ,  modeling  the  behavior  of  a  system 
and  an  FSA  As ,  called  the  specification  FSA  modeling  specifications  for  this  system  so  that  accepting  traces 
of  AP||AS  represent  desired  behaviors  of  Av .  Typically,  the  transitions  of  the  plant  automaton  are  classified 
as  being  controllable  (they  can  be  enabled  or  disabled)  or  uncontrollable  (they  are  always  enabled).  In  the 
context  of  our  motivating  applications,  and  for  simplicity,  all  transitions  can  be  considered  as  controllable. 
Note  also  that  one  can  view  As  as  a  safety  specification,  since  it  identifies  all  finite-length  sequences  that 
are  good  (bad)  behaviors  of  the  system. 

A  supervisory  controller  for  Ap  is  then  an  FSA  Ac  which,  when  composed  with  Ap,  will  disable  (control¬ 
lable)  transitions  leading  to  non-accepting  traces  of  As.  In  other  words,  C(AP\\AC)  C  C{AS).  A  supervisory 
controller  is  said  to  be  non-blocking  if  it  always  allows  the  composite  system  Ap\ \AC\ |AS  to  reach  an  accepting 
state.  It  is  said  to  be  maximally  permissive  when  it  does  not  disable  more  transitions  than  strictly  necessary. 
There  is  a  simple,  well-known  algorithm  for  finding  a  non-blocking,  maximally-permissive,  memoryless  su¬ 
pervisory  controller,  when  one  exists.  Informally,  the  algorithm  is  based  on  locating  “bad”  (blocking)  states 
in  the  composite  automaton  and  then  iteratively  pruning  away  controllable  transitions  to  such  states,  while 
marking  as  “bad”  states  any  uncontrollable  predecessors  of  existing  “bad”  states  or  new  blocking  states.  The 
reader  is  referred  to  the  book  by  Cassandras  and  Lafortune  for  further  details  [7] . 

The  framework  of  supervisory  control,  while  relevant,  is  not  sufficient  for  our  setting  of  improvisation. 
There  are  two  main  differences: 

(?)  Randomness:  To  improvise  is  to  incorporate  some  randomness  (“unpredictability”),  whereas  traditional 
supervisory  control  seeks  to  find  safe,  deterministic  strategies,  and 
(it)  Bounded  Divergence:  The  improvisation  is  created  from  a  reference  trace  wle f,  and  is  typically  “similar” 
to  it.  The  problem  definition  should  capture  this  constraint. 

We  therefore  define  a  new  controller  synthesis  problem,  termed  as  the  control  improvisation  problem,  in  the 
following  section. 

2.2  Problem  Definition 

The  aim  of  control  improvisation  is  to  randomly  generate  traces  among  a  family  of  “safe”  traces  which  are 
equivalent  based  on  some  creativity  measure.  Intuitively,  a  controller  that  uniformly  samples  traces  from  the 
safe  set  is  the  most  creative,  and  a  deterministic  controller  that  replicates  the  reference  trace  wref  is  the 
least  creative.  Our  goal  is  to  find  a  controller  of  “intermediate  creativity.”  Of  course,  creativity  is  a  vague, 
rather  qualitative  and  subjective  notion.  Notwithstanding  this,  we  assume  that  it  can  be  measured  by  a 
non- negative  function  dWlBi  on  words,  such  that  dWlB{(wle f)  =  0  and  dWiet(w)  increases  as  w  gets  “further” 
from  wle f.  The  control  improvisation  problem  is  then  defined  with  respect  to  wref  and  dWrB{  in  addition 
to  the  plant  Ap  and  specification  As.  A  controller  solving  the  control  improvisation  problem  resolves  the 
non-determinism  in  Ap  in  two  ways: 
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1.  When  several  transitions  of  Ap  are  safe  with  respect  to  As,  one  is  picked  following  a  random  distribution 
in  accordance  with  the  creativity  criterion; 

2.  When  no  safe  transition  is  available,  one  transition  of  Ap  is  modified  (replaced  with  alternative  transitions 
to  the  same  end  state  but  labeled  with  a  different  event)  to  prevent  blocking  while  still  preserving  safety. 

In  addition  to  this,  we  require  that  the  process  generates  accepting  words  of  a  minimal  length  n.  This  is 
achieved  by  running  the  improvisation  for  at  least  n  events,  and  then  until  an  accepting  state  is  reached. 
Formally,  the  control  improvisation  problem  is  defined  as  follows: 

Definition  3.  (Control  Improvisation  Problem)  A  control  improvisation  problem  Vi  is  an  eight-tuple 
( Ap ,  As ,  n,  u>ref,  dWref,  /,  e,  p)  where  Ap  is  a  (possibly  non- deterministic)  plant  FSA,  As  is  a  specification  FSA, 
n  £  N,  wref  is  an  accepting  trace  of  As  of  length  n,  dWref  its  associated  creativity  measure,  I  =  [d,d]  is  an 
interval  o/K,  e  €  (0,1),  and  p  £  (0,1].  A  solution  of  Vi  is  a  stochastic  process  generating  words  w  in  E* 
such  that  the  following  conditions  hold  for  each  w: 

(a)  Minimal  Length:  |io|  >  n; 

(b)  Safety:  w  is  an  accepting  trace  of  AS\\AP; 

(c)  Randomness:  The  measure  of  w  is  smaller  than  p,  and 

(d)  Bounded  Divergence:  Pr(dWref(w)  £  \d,d\)  >  1  —  e. 

Typically,  all  states  of  Ap  are  accepting  states,  and  therefore  the  Safety  condition  (b)  is  determined  by  As. 

2.3  Running  Example 

We  present  below  a  running  example  to  illustrate  the  definitions  and  approach: 

•  An  alphabet  composed  of  two  sets  of  symbols  E  =  Ea  x  Ea,  where  Ea  =  {a,  b ,  c}  and  Ea  =  { A ,  C} 

•  A  plant  model  Ap  and  a  specification  automaton  As : 


(*,  C)  (c,  *) 


where  we  use  the  special  symbol  *  as  a  “don’t  care”  symbol.  E.g.,  (6,*)  represents  either  {b,  A)  or  (b,C); 
•  A  reference  word:  wr  =  (b,A)(b,C)(a,A)(c,C). 

The  goal  is  to  design  a  controller  that  produces  variations  of  wr  satisfying  As . 

3  Theoretical  Hardness 

In  this  section,  we  analyze  the  computational  hardness  of  the  Control  Improvisation  problem  (Cl),  as  stated 
in  Definition  [4j  We  will  show  that  Cl  is  not  just  undecidable,  it  is  also  not  recursively  enumerable  (Turing- 
recognizable).  This  result  follows  from  the  undecidability  of  the  Control  Improvisation  Verification  problem 
(CIV),  which  informally  is  the  verification  version  of  Cl,  formalized  below.  The  undecidability  of  CIV  follows 
by  a  reduction  from  the  string- existence  problem  for  probabilistic  finite  automata  [23noj. 

3.1  Definitions  and  Background 

We  define  here  the  problem  of  verifying  a  candidate  solution  to  Cl. 

Definition  4.  (Control  Improvisation  Verification  Problem  —  CIV)  The  input  to  the  control  improvisation 
verification  problem  is  a  tuple  (Ap ,  Aa  ,n,wref,dWref,  /,  e,  p,  A1)  where  Ap  is  a  (possibly  non- deterministic) 
plant  FSA,  As  is  a  specification  FSA,  n  £  N,  wref  is  an  accepting  trace  of  As  of  length  n,  dWref  its  associated 
creativity  measure,  I  =  [d,d]  is  an  interval  o/K,  e,p  £  (0, 1),  and  A1  is  a  stochastic  process  generating  words 
w  in  E* .  Given  this  input,  the  problem  is  to  determine  for  each  generated  w  whether  or  not  the  following 
conditions  hold: 
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(a)  Minimal  Length:  |w|  >  n; 

(b)  Safety:  w  is  an  accepting  trace  of  AS\\AP; 

(c)  Randomness:  The  measure  of  w  is  smaller  than  p,  and 

(d)  Bounded  Divergence:  Pr(dWrcf(w )  £  \d,  d])  >  1  —  e. 

It  is  well  known  in  computability  theory  that  a  language  is  recursively  enumerable  (Turing-recognizable) 
if  and  only  if  it  is  verifiable.  Informally,  a  language  L  is  verifiable  if  there  exists  a  decision  procedure  that, 
given  a  problem  instance  P  in  L  and  a  candidate  solution  S  to  that  instance,  outputs  YES  if  S  is  a  solution 
to  P ,  and  NO  otherwise.  From  this  classic  result,  it  follows  that: 

Proposition  1.  Cl  is  recursively  enumerable  if  and  only  if  CIV  is  decidable. 

We  therefore  turn  our  focus  to  analyzing  the  decidability  of  CIV.  In  particular,  we  show  that  CIV  is  unde- 
cidable  using  a  reduction  from  the  string-existence  problem  for  probabilistic  finite  automatas  (PFAs)  |23ll()l . 
For  this,  we  first  introduce  the  notion  of  a  PFA,  a  simple  stochastic  process. 

Definition  5.  (Probabilistic  Finite  Automaton  JUj/)  A  probabilistic  finite  automaton  (PFA)  is  a  5-tuple 
(Q,  E,T,  s,  /)  where  Q  is  a  finite  set  of  states,  E  is  the  input  alphabet,  T  is  a  set  of  \Q\  x  \Q\  row- stochastic 
transition  matrices,  one  for  each  symbol  in  E,  s  £  Q  is  the  initial  state,  and  f  £  Q  is  the  accepting  state. 

The  automaton  occupies  a  single  state  in  Q  at  any  given  point  of  time.  It  begins  in  state  s,  transitions  from 
state  to  state  based  on  the  current  input  symbol  and  the  distribution  defined  by  the  corresponding  stochastic 
transition  matrix,  and  halts  when  it  transitions  to  the  accepting  state  /.  It  can  be  assumed  that  /  is  an 
absorbing  state,  meaning  that  the  automaton  stays  in  /  after  reaching  it. 

A  PFA  accepts  a  string  w  £  E*  if  the  automaton  ends  in  the  accepting  state  after  reading  string  w, 
otherwise  it  rejects  it.  For  any  finite-length  string  w  accepted  by  a  PFA,  there  is  an  associated  probability 
with  which  w  is  accepted.  Given  these  notions,  the  problem  of  interest  in  this  paper  is  the  following  one: 

Definition  6.  (String-Existence  Problem  for  PFAs  ]22j)  Given  a  probabilistic  finite  automaton  (PFA),  de¬ 
cide  whether  or  not  there  is  some  input  string  w  £  E*  such  that  the  given  PFA  accepts  that  string  with 
probability  exceeding  some  input  threshold  r. 

Paz  [23]  established  the  undecidability  of  this  problem.  Later,  Condon  and  Lipton  [10]  gave  an  alternative 
proof.  More  recently,  Blondel  and  Canterini  showed  that  the  problem  remains  undecidable  even  when  the 
alphabet  E  has  just  two  letters  [5]. 

We  also  mention  here  that  the  above  string-existence  problem  is  essentially  equivalent  to  the  problem  of 
probabilistic  planning,  as  shown  by  Madani  et  al.  \T2\  (which  is  therefore  also  undecidable).  The  probabilistic 
planning  problem  is  relevant  to  our  setting  since  it  is  a  form  of  controller  synthesis:  a  plan  is  a  string  that 
is  a  sequence  of  actions  navigating  from  an  initial  state  to  a  goal  state. 

3.2  CIV  is  Undecidable 

We  now  give  our  reduction  of  the  string-existence  problem  of  PFAs  to  CIV,  which  yields  the  following 
theorem. 

Theorem  1.  CIV  is  undecidable. 

Proof:  Consider  an  instance  of  the  PFA  string-existence  problem:  a  pair  (fP,r),  where  V  is  a  PFA. 

We  create  an  instance  of  CIV  as  a  tuple  ( Ap ,  As,n,  wTe f,  dWre f,  I,  e,  p ,  A')  as  follows: 

—  A 1  equals  V  with  a  slight  modification  where  we  add  one  more  letter  £  to  Vs  alphabet  and  direct  all 
strings  with  any  occurrence  of  that  letter  to  a  non-accepting  sink  state  with  probability  one  transitions; 

—  e  is  set  to  1  —  r; 

—  Set  both  Ap  and  As  to  be  the  universal  automaton  that  accepts  all  strings  in  E*- 

—  n  =  1; 
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—  wre f  is  any  string  with  the  letter  £; 

Set  ^turef  (w)  to  be  0  if  the  string  w  contains  £,  otherwise  1; 

—  I  is  set  to  be  [1,1],  and 

—  p  is  set  to  be  1. 

Note  that  conditions  (a)-(c)  are  trivially  satisfied,  so  the  CIV  problem  reduces  to  one  of  checking  (d): 
that  Pr(dWtet(w)  =  1)  >  r.  The  strings  w  generated  by  A1  satisfying  dWref(w)  =  1  are  exactly  those  that 
are  accepted  by  V.  Therefore,  there  is  a  string  accepted  by  V  with  probability  exceeding  r  if  and  only  if 
condition  (d)  is  satisfied,  i.e. ,  iff  A 1  is  a  valid  solution  to  the  CIV  problem.  □ 

3.3  Hardness  of  Control  Improvisation 

Proposition  [l]  and  Theorem  [T]  taken  together  imply  the  following  hardness  result: 

Theorem  2.  The  Control  Improvisation  Problem  (Cl)  is  not  Recursively  Enumerable. 

We  make  some  observations  about  this  result  and  its  implications: 

—  First,  we  note  from  the  proof  of  Theorem  [l]  that  the  hardness  steins  from  the  presence  of  condition  (d). 
As  we  will  show  in  the  following  section,  it  is  easy  to  synthesize  an  A’  that  satisfies  conditions  (a)-(c). 
This  indicates  that  one  approach  toward  making  the  problem  decidable  would  be  to  somewhat  relax 
condition  (d),  for  example,  by  imposing  additional  conditions  on  the  form  of  the  divergence  measure 

dwIBf  ■ 

—  Second,  we  note  that  the  class  of  stochastic  process  required  to  obtain  the  hardness  result  was  relatively 
weak  —  a  probabilistic  finite  automaton  (PFA).  It  has  been  observed  that  the  PFA  is  similar  in  expressive 
power  to  other  models  widely  used  in  planning  and  optimization,  such  as  partially-observable  Markov 
Decision  Processes  (POMDPs)  [22:.  Therefore,  one  approach  to  make  the  problem  simpler  is  to  restrict 
the  class  of  stochastic  processes  even  more,  to  a  model  that  is  less  expressive  than  arbitrary  PFAs. 

Given  the  computational  hardness  of  Cl,  in  the  following  section  we  present  an  approach  to  solve  the 
problem  that  works  well  in  practice.  Our  approach  synthesizes  a  controller  A‘  that  ensures  that  conditions 
(a)-(c)  hold,  and  uses  heuristics  to  satisfy  condition  (d)  in  practice  (without  theoretical  guarantees). 

4  Approach 

Our  approach  has  three  components: 

1.  Generalization  from  the  reference  sequence:  we  compute  an  FSA  A9  that  accepts  u>ref  and  variations  of  it; 

2.  Safety  Supervision:  we  compose  A9  with  Ap ,  adding  safe  transitions  with  respect  to  As  where  needed, 
and 

3.  Divergence  Supervision:  we  tune  transition  probabilities  in  the  modifed  A9\  \AP  with  respect  to  a  creativity 
measure  that  we  define. 

We  now  describe  each  of  these  components  in  more  depth. 


4.1  Generalization  using  Factor  Oracles 

The  core  of  our  improvisation  approach  is  based  on  the  factor  oracle  (FO)  structure  [2].  A  factor  oracle  is  a 
compact  automaton  representation  of  all  contiguous  subwords  (factors)  contained  in  a  word  w  =  o\<T2  ■  ■  ■  er n ■ 
It  has  |w|  +  1  states,  all  accepting,  and  its  transitions  can  be  categorized  into 

1.  Direct  transitions  of  the  form  Si  — ^-4  Sj+i; 

2.  Forward  transitions  of  the  form  s,  -4  sj  where  j  >  i  +  1  and  a  is  some  letter  in  w, 

3.  Backward  transitions,  also  called  suffix  links ,  of  the  form  Sj  A  Sj  with  j  <  i. 
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The  details  of  the  construction  of  factor  oracles,  can  be  found  in  [9j.  Some  properties  of  FOs  are  as  follows: 

•  An  accepting  word  that  takes  only  direct  transitions  is  a  prefix  of  w ; 

•  Factors  of  w  are  accepting  words  taking  only  direct  and  forward  transitions; 

•  Finite  concatenation  of  factors  of  w  are  accepting  word  taking  all  three  types  of  transitions. 

These  properties  make  the  FO  a  suitable  structure  to  generalize  wre[,  so  a  first  step  to  solve  the  control 
improvisation  problem  is  to  define  A9  =  FO(wref).  In  Figure [l]  we  show  the  factor  oracle  obtained  from  the 
reference  word  bbac. 


(a,  A) 


Fig.  1.  Factor  Oracle  improviser  obtained  from  the  reference  word  bbac. 


4.2  Safety  Supervision 


Even  though  wle f  is  an  accepting  word  for  the  plant  Ap  and  specifications  As ,  there  is  no  guarantee  that 
its  generalization  A9  composed  with  A v  is  non-blocking  for  As.  However,  assuming  that  there  exists  a  non- 
blocking  memoryless  controller  A^ax  for  AP||AS  —  something  that  can  be  checked  using  standard  supervisory 
control  [7]  and  which  is  guaranteed  by  the  existence  of  wre f  —  it  is  always  possible  to  make  AP||A9||AS  non- 
blocking  by  adding  transitions  as  follows.  Let  ( q,c,s )  be  a  blocking  state  of  AP||AS||AS.  Since  A^ax  is  a 


non-blocking  memoryless  controller,  there  exists  a  non-blocking  transition  ( q ,  s)  (q' ,  s')  in  AP||AS  for  some 
cr  £  E.  Hence  we  can  pick  some  state  d  in  A9  and  add  the  transition  c  d  to  the  transition  relation  of 
A9.  This  effectively  adds  the  transition  ( q,c,s )  A-  ( q',d,s ')  in  AP||A9||AS.  This  procedure  is  repeated  until 
no  blocking  state  can  be  found  in  AP||A9||AS. 

To  illustrate  this  construction,  consider  automaton  Ap,  As  defined  in  Sec.  2A_  and  A9  on  Fig.  |Tj  Con¬ 
structing  the  product  „4P||AS||A9,  we  find  that  the  run 


(<?o,So,c0)  (<?i,s0,ci)  (q0,s  0,c2)  (<?i,Si,c3)  -4  (gi,Si,c0) 


leads  to  a  blocking  state  ( qo ,  Si,  Co),  as  As  requires  a  c  transition,  whereas  A9  only  permits  an  a  or  a  b.  Hence 
we  add  the  transition  cq  ^  C-\  c\. 

One  remaining  question  is  how  to  pick  the  un-blocking  transition  when  more  than  one  choice  is  possible. 
When  this  makes  sense,  we  should  add  a  transition  which  is  close  to  an  existing  transition  of  A9 .  For  example, 
in  our  music  application  where  transitions  corresponds  to  note  events,  we  can  pick  notes  with  the  closest 
pitch  (or  frequency)  or  duration. 


4.3  Divergence  Supervision  via  Probability  Assignment 

The  last  step  is  to  define  transition  probabilities  satisfying  the  Randomness  and  Bounded  Divergence  require¬ 
ments.  We  begin  by  concretizing  the  creativity  divergence  that  we  use,  which  is  a  variant  of  the  Normalized 
Compression  Distance  introduced  in  mj  and  is  based  on  the  theory  of  Kolmogorov  complexity.  The  Kol¬ 
mogorov  Complexity  of  an  object  x  (denoted  K(x ))  is  defined  as  the  length  of  the  shortest  compressed  code 
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to  which  x  can  be  losslessly  reduced.  The  Kolmogorov  Complexity  of  y  given  x  (denoted  K(y\x))  is  the  length 
of  the  shortest  compressed  code  to  which  y  can  be  losslessly  reduced  assuming  knowledge  of  x.  In  practice, 
K(x)  is  not  computable,  and  so  is  typically  approximated  by  C(x)  where  C(x)  =  length(compress(x ))  for 
some  compression  algorithm  compress  and  K{y\x)  can  be  approximated  by  C(y\x)  =  C{xy)  —  C7 (a?)  [S] .  Then 
the  Normalized  Compression  Distance  (NCD)  [5Tj  between  x  and  y  is  defined  as 


NCD(x,y ) 


max  (C(x\y),  C(y\x)) 
max  (C(x),  C(y)) 


Informally,  it  estimates  1  minus  the  mutual  information  in  x  and  y.  In  our  case  however,  the  amount  of 
information  in  an  improvisation  that  is  not  in  the  reference  trace  is  of  more  interest  than  mutual  information, 
hence  we  define  the  creativity  divergence  based  on  the  asymmetric  quantity  Gc(y)  i  as  follows. 

Definition  7  (Creativity  Divergence  dWret). 


dWrc,{vj) 


C(w\Wref)  ,  C(wrefWref ) 

C(w)  C(wref ) 


The  second  term  in  the  sum  ensures  that  dWiB!(wle f)  =  0.  In  our  application  we  used  the  LZW  compression 
algorithm  to  compute  C(-). 

Finally,  a  simple  way  to  assign  probabilities  to  transitions  in  a  FO  is  as  follows.  Recall  that  traversing 
the  n  +  1  states  in  sequence  by  taking  direct  transitions  reproduces  wre[.  Improvisation,  i.e. ,  variation  from 
the  original  sequence,  is  obtained  by  randomly  taking  forward  transitions  or  backward  transitions.  Thus  the 
higher  the  probability  of  taking  direct  transitions,  the  more  similar  the  output  is  to  the  original  sequence.  In 
our  implementation,  we  assign  the  probability  p  to  each  direct  transition,  so  that  the  improviser  replicates 
wre f  when  p  =  1,  and  probability  1  —  p  equi-distributed  to  other  outgoing  forward  or  backward  transitions. 
We  assume  that  p  is  such  that  the  direct  transition  is  always  the  most  probable.  This  provides  for  a  simple 
parameter  controlling  how  different  the  improvised  sequence  is  from  u>ref- 

The  overall  supervision  process  for  solving  the  control  improvisation  problem  is  summarized  below: 


1.  Maintain  asequence  (qo,  Co,  So)(qi,  C\,  so) . . .  (qk,  Ck,  Sk)  of  states  of  (>4P||.A9)||.4S  and  a  word  Wk  =  owi . . .  <7k 
Sk, 

2.  If  k  >  n  and  Sk  is  accepting,  return  w  =  Wk 

3.  Else  if  ((/fc,Cfc,Sfc)  has  outgoing  transitions  (non-blocking),  assign  probabilities  according  to  replication 
probability  p  and  pick  ak+i  and  (qk+i,Ck+i,  sk+i) 


4.  Else  if  ( qk,Ck,Sk )  is  blocking,  pick  a  safe  (Jk+i  and  (qk+i,  Cfc+i,  Sfc+i)  as  defined  in  Section  4.2 


Theorem  3.  Assume  that  ^4P||.4S  is  non-blocking,  n  >  and  dWref  is  measurable.  Then  the  stochastic 
process  defined  above  solves  the  control  improvisation  problem  ( Ap,As1n,wref,du,ref,I,£,p )  with  probability  1 
for  some  e. 


Proof:  Since  .AP||.4S  is  non-blocking,  Section  4.2  showed  that  („4P| \A9) augmented  with  safe  transitions  is 
non-blocking  as  well,  which  means  that  an  accepting  state  of  „4P||.49||.4S  is  always  reachable  in  a  finite 
number  of  steps.  Each  transition  probability  is  non-zero,  hence  an  accepting  state  must  be  reached  in  a 
finite  number  of  steps  with  probability  1.  By  step  2,  the  process  then  stops  and  returns  an  accepting  word 
satisfying  the  minimal  length  criterion.  As  direct  transitions  have  always  the  highest  probability,  wref  is 
returned  with  the  highest  probability.  This  probability  is  equal  to  pn  which  is  smaller  than  p  when  n  >  , 


hence  the  randomness  criterion  is  met.  Finally,  since  dWre{  is  measurable,  so  is  the  event  dWre{(w)  £  [d,d\ 
which  means  that  there  exists  an  e  >  0  such  that  Pr{dWie({w )  £  [d,  d\)  >  1  —  e.  □  □  Note  that  this  does 
not  provide  a  fully  constructive  solution  to  the  control  improvisation  problem,  as  £  is  a  part  of  the  problem. 
However,  it  provides  a  reasonable  empirical  solution:  for  a  given  replication  probability  p ,  one  can  generate 
a  populations  of  improvisations,  estimate  e  for  given  [d,  d],  and  repeat  the  process  tuning  p  accordingly  until 
obtaining  a  satisfactory  result.  In  our  experiments,  we  obtained  narrow  creativity  intervals  with  e  =  5%  with 
100  improvisations  (see  Section  [6]). 


5  Jazz  Control  Improvisation 


In  this  section,  we  apply  control  improvisation  to  Jazz  music. 


5.1  Musical  Notations 

We  start  with  some  musical  vocabulary.  The  note  is  the  atomic  entity.  It  has  two  attributes:  a  pitch ,  which 
represents  its  fundamental  frequency,  and  a  duration.  A  pitch  is  noted  with  a  letter,  from  a  to  g,  a  number 
from  0  to  8  denoting  the  octave  and  an  optional  accidental  #,  not  available  for  pitches  b  and  e.  Pitches  are 
ordered  as:  {aO,  aO#,  bO,  cO,  cO#,  dO, . . . ,  e8,  f  8,  f  8#,  g8,  g8#}.  We  call  the  difference  between  two  consecutive 
pitches  a  semi-tone.  When  the  octave  is  not  specified,  it  denotes  the  set  of  all  pitches  that  differs  only  by 
their  octaves,  e.g.,  c  =  {cO,  cl, . . . ,  c8}.  A  rest  can  be  defined  as  a  note  with  no  pitch,  or  a  silent  note.  The 
set  of  durations  is  also  a  finite  ordered  set  { J',  J,  J, . . .}  where  two  consecutive  durations  differ  by  a  power  of 
two: 

J  =  J  J  =  >  >>  JV.. 

A  chord  is  a  finite  set  of  pitches,  usually  noted  using  a  capital  letter  matching  one  of  its  pitch  elements  called 
its  root,  and  additional  letters  characterizing  its  nature  (major,  minor,  dominant,  etc). 

A  piece  of  jazz  music  can  be  simplified  into  a  melody,  a  string  of  pitched  notes  and  rests,  aligned  with  an 
accompaniment ,  a  looping  sequence  of  chords  with  given  durations.  The  time  unit  is  the  beat  and  the  piece  is 
divided  into  bars  which  are  sequences  of  4  beats.  We  assume  that  the  accompaniment  is  fixed  and  our  goal  is 
to  define  an  improviser  for  the  melody.  Hence,  the  plant  FSA  will  model  the  behavior  of  the  accompaniment, 
without  constraining  the  melody,  and  the  specification  FSA  will  set  constraints  on  acceptable  melodies 
played  together  with  the  accompaniment.  To  encode  all  events  in  a  score,  we  use  an  alphabet  composed  of 
the  cross-product  of  four  alphabets:  E  =  Ep  x  Ed  x  Ec  x  Eb,  where 

•  Ep  is  the  pitches  alphabet,  e.g.,  Ep  =  {£,  aO,  a#0,  bO,  cO,  ■  •  •  }, 

•  Ed  is  the  durations  alphabet,  e.g.,  Ed  =  { J\  J,  J, . . .}  with  J  =  1  beat.  Note  that  Ed  also  includes  fractional 
durations,  e.g.,  for  triplets,  as  discussed  below; 

•  Ec  is  the  chords  alphabet,  e.g.,  Ec  =  {C,  C7,  G,  Emaj,  Adim,  . . .}, 

•  Eb  is  the  beat  alphabet.  E.g,  if  the  smallest  duration  (excluding  fractional  durations)  is  the  eighth  note, 
i.e.,  half  a  beat,  then  Eb  =  {0, 0.5, 1, 1.5,  2,  2.5,  3,  3.5}. 

All  automata  in  the  following  will  use  implicitly  the  full  alphabet  E.  However,  each  component  alphabet  is 
meant  to  address  one  particular  aspect  of  the  music  formalization,  and  we  will  construct  the  specification 
automaton  by  the  composition  of  different  sub-automata  using  these  different  component  alphabets.  Also, 
note  that  this  encoding  of  music  is  of  course  not  unique  nor  meant  to  be  canonical,  and  other  types  of 
alphabets  can  be  used  in  replacement  to  or  to  complement  the  one  we  propose  and  used.  E.g.,  we  do  not 
consider  here  note  velocity  (i.e.,  the  intensity  of  the  sound  of  the  note). 

Example  1.  we  provide  an  example  encoding  of  a  simple  score  using  the  formalism  defined  in  Section  [5] 
Consider  the  following  extract: 


G  C 


It  contains  a  melody  and  a  chord  progression  which  is  represented  by  the  following  word  in  our  alphabet: 

(g4,J,G,0)  (b4,J,G,2)  (d5,J,G,3)  (d5,J,C,0)  (b5,J,C,l)  (g4,J,C,2) 
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5.2  Encoding  Chord  Progressions 


The  harmonic  context  of  the  melody  is  given  by  the  chord  progression  (accompaniment).  The  plant  FSA  Ap 
then  encodes  the  events  of  specified  chords  at  specified  times.  The  basic  idea  of  the  encoding  is  to  define  as 
many  states  as  there  can  be  events  of  the  minimal  possible  duration  in  a  bar,  i.e.,  in  four  beats,  and  replicate 
those  states  for  as  many  bars  as  needed.  Then  transitions  from  one  state  q  to  another  state  q'  is  possible 
with  a  note  of  the  proper  duration  is  possible  and  if  in  the  duration  of  this  note,  there  is  no  chord  change. 
This  construction  is  illustrated  in  Figure  [2j 


0)  1)  (*,J,C,  2)  0)  1) 


Fig.  2.  Chords  progression  automaton  Ap  of  the  example.  It  consists  in  an  accompaniment  looping  on  chord 
C  during  4  beats  (1  bar)  and  chord  G  during  1  bar,  with  duration  alphabet  restricted  to  quarter  notes  and 
half  notes. 


5.3  Rhythmic  and  Harmonic  Specifications 


The  specification  FSA  encodes  rhythmics  and  harmonic  constraints  involving  notes  in  the  melody  which 
enforce  some  general  structure  and  basic  musical  consistency.  The  following  specifications  are  adapted  and 
simplified  from  the  generic  guidelines  found  in  El-  We  structure  Jazz  melodies  into  licks  defined  informally 
as  short  melodic  phrases  of  pitched  notes  separated  by  either  rests  or  long  notes.  Then  we  impose  that  licks 
start  on  specific  beats.  E.g.,  start  beats  can  be  0.5,  1.5,  2.5  or  3.5,  i.e.,  off-beats.  This  specification  can  be 
encoded  in  the  automaton  (a)  on  Fig.  [3j 

The  second  specification  has  to  do  with  durations  which  are  not  multiple  of  the  smallest  duration.  In 
that  case,  we  require  that  it  be  repeated  until  the  total  duration  is  such  a  multiple.  The  typical  example 


of  this  situation  is  the  triplet, 


,  which  is  the  concatenation  of  three  notes  of  duration  f,  noted 


J\  Without  loss  of  generality,  we  model  only  this  case,  shown  as  the  FSA  (b)  in  Fig.  |3j  as  other  fractional 
durations  are  dealt  with  in  a  similar  manner. 

Finally,  we  define  constraints  on  the  pitches  of  the  notes  in  the  melody.  The  pitched  notes  are  classified 
based  on  their  accompanying  chord.  We  follow  the  three  primary  tone  classifications  as  described  in  US: 


—  Chord  tone:  a  pitch  belonging  to  the  current  chord; 

—  Color  tone:  a  pitch  that  does  not  belong  to  the  current  chord  but  complements  and  creates  euphony 
with  the  current  chord; 

—  Approach  tone:  neither  a  chord  nor  color  tone  that  is  followed  by  pitched  note  that  differs  by  exactly 
1  semitone; 


This  classification  provides  a  set  of  “good”  pitches  for  each  chord.  Color  tones  can  be  defined  by  a  scale, 
i.e.,  a  set  of  pitches,  which  is  overall  “compatible”  with  the  whole  song,  and  to  which  we  remove  potential 
“avoid”  notes  for  the  current  chord.  As  an  example,  consider  a  song  in  the  key  of  C.  All  notes  in  the  major 
scale  {c,  d,  e,  f ,  g,  a,  b}  are  safe  to  be  played  in  general,  however  if  an  F  chord  is  played  (composed  of  pitches 
tg#,  c),  we  need  to  avoid  b  which  is  highly  dissonant  with  f .  Hence  the  set  of  good  notes  in  that  situation 
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rest 


rest 


short 


(a)  „4f  Ticks 


(b)  A%:  Triplets 


(c)_4§:  Pitches 


Fig.  3.  Specification  automata  As  =  Ai||.4|||.4.§.  “rest”  indicates  a  rest  in  the  melody  of  any  duration, 
“start-beat”  indicates  a  label  of  the  form  (*,  *,  ★,  b)  where  &  is  a  beat  value  for  which  a  lick  can  start,  “short” 
indicates  a  note  in  the  melody  of  short  duration,  e.g.,  of  duration  less  or  equal  to  a  beat  (J).  Conversely, 
“long”  indicates  a  note  of  a  longer  duration,  e.g.,  strictly  more  than  J.  “good-pitch”  indicates  a  note  with  a 
pitch  which  is  either  a  chord  or  a  color  tone,  “approach”  indicates  an  approach  tone,  “approachi-j”  indicates 
an  approach  tone  plus  or  minus  a  semi-tone. 


is  {c,  d,  e,  f ,  g,  g#,  a}.  The  approach  tones  make  it  possible  to  deviate  “temporarily”  from  these  good  notes: 
if  a  note  not  classified  as  good  is  played,  it  must  be  short  and  followed  by  a  good  note  immediately  and  not 
further  than  a  semi-tone  away  from  it.  We  simplify  this  into  the  automaton  (c)  in  Fig.  [3] 


5.4  Specifications  Controllability 

In  this  section,  we  provide  a  supervisory  controller  for  a  plant  automaton  Ap  modeling  Jazz  accompaniment 
and  a  specification  As  =  defined  as  above.  We  first  note  that  in  order  to  satisfy  A{,  a  necessary 

condition  is  that  a  state  of  Ap  with  outgoing  transition  labeled  with  (*,  *,  *,  b)  where  b  £  start  beat  is 
accessible  in  k  steps  with  k  <  n  —  1.  Then  a  controller  for  _4f  can  take  the  following  form: 


Regarding  there  is  no  need  to  make  any  assumption  for  the  existence  of  a  controller,  as  a  conservative 
strategy  simply  avoiding  notes  with  fractional  durations  is  always  accepting.  It  is  then  easy  to  derive  the 
most  permissive  controller  which  can  additionally  use  fractional  durations,  provided  it  remains  enough  steps 
with  the  same  fractional  duration  to  resolve  to  an  integer  duration  before  step  n: 


Similarly,  for  A%,  a  conservative  strategy  consisting  in  always  picking  “good”  pitches  is  guaranteed  to 
produce  an  accepting  trace.  This  requires  that  the  set  of  good  pitches  is  never  empty.  This  is  the  case  as  this 
set  includes  in  particular  the  chord  tones,  i.e. ,  the  pitches  included  in  the  current  chord.  A  more  permissive 
controller  could  also  take  detours  through  approach  tones  at  every  step  except  for  the  last  one. 


Clearly  by  composing  the  three  strategies, it  is  easy  to  extract  a  valid  controller  for  As.  One  possibility 
is  the  following: 

start-beat 

Ashort  good-pitch 

Agood-pitch  /'“^Ashort 
- - 


good-pitch 

Ashort 


good-pitch 

.Ashort 
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In  words,  we  use  rests  with  appropriate  durations  for  k  steps  until  a  starting  beat  is  available,  then  we 
use  only  short  notes  with  “good”  pitches  until  producing  a  length  n  trace. 


5.5  Jazz  Improviser  Architecture 

The  automaton  obtained  by  composing  the  specifications  above  with  the  accompaniment  automaton  is  non- 
blocking;  thus,  we  can  apply  the  approach  proposed  in  Section  [4j  However,  our  early  experiments  showed 
that  a  single  viewpoint  system  in  which  the  model  predicted  note  duration  and  pitches  together  was  too 
inflexible,  in  that  the  supervisory  control  phase  would  have  to  many  edges  to  the  factor  oracle  generator, 
therefore  we  adopted  a  multiple  viewpoint  system  which  improvises  rhythms  and  melodic  pitches  separately. 
The  architecture  presented  in  Figure  [4]  has  been  implemented  in  Python,  using  the  Music21  library]^]  We 
present  some  results  in  the  next  section,  as  well  as  on  a  dedicated  webpage^] 


1 


Generated  improvisation 

E7  A7 

==3  O  ==3  p 

Fig.  4.  Architecture  of  the  improviser  with  multiple  view  points. 


6  Results 

We  evaluated  our  improviser  using  a  melody  generated  by  Impro- visor  (|l8j)  over  the  standard  8-bar  blues 
chord  progression.  Using  this  melody  as  the  reference  trace,  we  generated  an  improvisation  from  a  super¬ 
vised  factor  oracle  and  an  improvisation  from  an  unsupervised  factor  oracle  both  with  probability  of  direct 
transitions  assigned  to  be  p  =  0.8  (Figure  [5]).  The  reference  melody  and  the  supervised  improvisation  share 
several  similarities,  however  the  improvisation  also  deviates  sufficiently  from  the  original  to  be  considered 
unique. 

4  http://web.mit.edu/music21 

5  http: //www. eecs .berkeley . edu/~donze/impro_page .html 
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(c) 


Fig.  5.  Training  melody  (a),  improvisation  generated  by  supervised  factor  oracle  (b),  improvisation  generated 
by  unsupervised  factor  oracle  (c).  Black  notes  are  chord  tones,  green  notes  are  color  tones,  blue  notes  are 
approach  tones,  and  red  notes  are  other  (undesirable)  tones. 
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The  unsupervised  improvisation  contains  several  notes  (highlighted  in  red)  that  are  not  chord,  color,  or 
approach  tones  and  which  are  therefore  undesirable.  Conversely,  supervisory  control  preserves  the  melodicism 
of  the  improvisation  as  evidenced  by  the  lack  of  red  notes  in  the  supervised  improvisation. 

We  also  evaluate  the  supervised  factor  oracle  improviser  by  the  creativity  divergence  defined  in  Sec¬ 
tion  4.3  In  particular,  we  evaluate  improvisational  creativity  with  respect  to  rhythm  (Figure  [b]).  Overall, 
the  plot  shows  an  inverse  relationship  between  replication  probability  and  average  creativity,  confirming  that 
creativity  correlates  with  how  similarly  an  improvisation  mimics  the  original  melody  and  suggesting  that  it 
is  possible  to  bound  creativity  by  an  appropriate  choice  of  replication  probability. 


0.4 
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o 
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Fig.  6.  Average  creativity  with  respect  to  rhythm  of  100  improvisations  generated  by  factor  oracle,  with 
e  =  5%  confidence  intervals. 
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7  Related  Work 

To  our  knowledge,  a  concept  like  control  improvisation  has  not  been  introduced  in  the  literature  before.  We 
survey  the  most  closely  related  work  in  the  areas  of  music  improvisation  and  control  theory. 

Broadly  speaking,  there  are  two  approaches  to  automatic  music  improvisation:  rule-based  and  data-driven. 
Rule-based  approaches  attempt  to  define  the  rules  of  “good”  improvisations  and  generate  pieces  of  music 
that  follow  those  rules.  However,  it  has  been  observed  that  it  is  difficult  to  come  up  with  the  “right”  rules, 
resulting  in  systems  that  are  either  too  restrictive,  limiting  creativity,  or  too  relaxed,  thereby  allowing  musi¬ 
cal  dissonance  mm-  Consequently,  recent  musical  improvisers  tend  towards  data-driven  or  “predictive” 
approaches  that  employ  machine  learning.  These  approaches  learn  a  probabilistic  model  from  music  sam¬ 
ples,  and  use  that  model  to  generate  new  melodies.  Examples  of  such  models  include  stochastic  context-free 
grammars  (SCFGs)  [15117) .  hidden  Markov  models  (HMMs)  [IB],  and  universal  predictors  [13111216] .  Some 
approaches  combine  rule-based  and  data-driven  approaches;  e.g.,  the  Impro- visor  system  [T8]  based  on  SCFGs 
has  rules  learned  from  training  licks  through  the  grammatical  inference  un¬ 
it  lias  been  found  that  certain  universal  predictors  outperfom  other  stochastic  models  in  producing 
stylistically  appropriate  music  [Tj .  Universal  predictors  vary  based  on  the  data  structures  and  algorithms 
used,  such  as  incremental  parsing  (IP)  [2j  inspired  from  dictionary  based  compression  algorithms  from  the 
Lempel-Ziv  family  [3j,  probabilistic  suffix  trees  (PST)  [TJ],  and  factor  oracles  (FO)  [2].  Amongst  these,  it  has 
been  found  that  the  latter  has  some  advantages.  Unlike  both  IP  and  PST,  the  factor  oracle  is  both  complete 
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(contains  all  factors  of  the  given  word)  and  can  be  constructed  on  the  fly.  Due  to  this,  factor  oracles  are  at 
the  core  of  the  OMax  improvisation  system  |^]  developed  at  IRC  AM  and  which  has  been  used  in  a  number  of 
performances. 

Our  approach  extends  this  state  of  the  art  by  providing  a  way  to  (i)  enforce  certain  “safety”  rules  on 
the  generated  melody,  and  (ii)  bounding  the  “creativity”  divergence  from  the  original  melody.  Also,  many 
of  the  improvisers  discussed  rely  on  a  single  viewpoint  system.  In  other  words  they  attempt  to  encapsulate 
and  improvise  all  aspects  (rhythm,  pitches,  volume,  etc.)  of  an  improvisation  simultaneously.  For  example, 
in  QQ  the  alphabet  of  the  prediction  model  is  the  cross  product  of  the  beat  each  note  starts  on,  the  note’s 
pitch,  and  the  note’s  duration.  Following  Conklin  and  Cleary  m  we  implemented  a  more  flexible  multiple 
viewpoint  approach  to  music  generation  in  which  note  aspects  are  predicted  separately  and  then  aggregated. 

In  the  area  of  control,  our  problem  varies  from  traditional  supervisory  control  as  noted  in  Sections  [l] 
and  [2]  Perhaps  the  closest  existing  notion  is  that  of  adaptive  control  0] ,  in  which  the  controller  deals  with 
a  highly-dynamic  environment  by  learning  parameters  of  an  environment  model,  and  adapting  the  control 
strategy  to  changing  parameter  values.  While  control  improvisation  does  involve  generalization  (learning),  it 
must  also  meet  additional  constraints  on  randomness  and  divergence  with  respect  to  a  reference  trajectory. 
Further  our  work  is  so  far  limited  to  the  purely  discrete  setting,  whereas  much  of  adaptive  control  is  in  the 
continuous-time  setting. 

8  Conclusion 

We  introduced  the  concept  of  control  improvisation  and  presented  an  approach  to  solve  it.  Our  approach 
shows  promise  for  automatic  improvisation  of  Jazz  music.  We  believe  this  paper  is  just  a  first  step,  and  there 
is  plenty  of  room  for  futher  work  on  both  theory  and  applications  of  control  improvisation.  In  particular,  this 
work  can  be  seen  as  a  variation  of  discrete  supervisory  control  with  additional  randomness  and  improvising 
requirements  in  the  solution.  Other  types  of  controller  synthesis  problem  could  be  adapted  similarly  such  as, 
e.g.,  reactive  controller  synthesis  with  LTL  specifications. 

More  work  is  required  to  investigate  the  full  space  of  possible  creativity  divergence  measures  in  the 
context  of  various  applications.  Moreover,  there  is  room  for  improvement  over  the  base  approach  we  present 
in  this  paper,  e.g.,  to  provide  stronger  theoretical  guarantees  for  the  “bounded  distance”  condition. 

For  the  application  to  Jazz  improvisation,  one  can  consider  inferring  the  specification  automaton  from 
examples  of  “good”  and  “bad”  melodies.  Further,  it  would  be  interesting  to  consider  real-time  improvisation 
and  improvising  collectively  on  a  set  of  melodies  rather  than  just  a  solo  piece.  Finally,  while  this  paper 
presented  a  musical  application  of  the  factor  oracle  combined  with  creativity  selection,  there  are  other 
potential  application  domains  to  be  explored,  including  home  automation,  software  testing  (generating  strings 
to  test  a  program),  secure  control  systems,  and  emergency  management. 
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